Simple Unsupervised Morphology Analysis Algorithm (SUMAA)

نویسندگان

  • Minh Thang Dang
  • Saad Choudri
چکیده

SUMAA is a hybrid algorithm based on letter successor varieties for an en­ tirely unsupervised morphological analysis. Using language pattern and structural recognition it works well on both isolated and agglutinative lan­ guages. This paper gives a detailed analysis of how we developed SUMAA. F-Measures (MorphoChal­ lenge, 2005) achieved by SUMAA for the English, Finnish and Turkish datasets were 51.83%, 60.18% and 55.94% respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Concept Discovery In Hebrew Using Simple Unsupervised Word Prefix Segmentation for Hebrew and Arabic

Fully unsupervised pattern-based methods for discovery of word categories have been proven to be useful in several languages. The majority of these methods rely on the existence of function words as separate text units. However, in morphology-rich languages, in particular Semitic languages such as Hebrew and Arabic, the equivalents of such function words are usually written as morphemes attache...

متن کامل

Statistical Stemming for Kannada

Stemming is a process that groups morphologically related words into the same class and is widely used in information retrieval for improving recall rate. Here we study a set of statistical stemmers for Kannada, a resource-poor language with highly inflectional and agglutinative morphology. We compare stemming using simple truncation, clustering and an unsupervised morpheme segmentation algorit...

متن کامل

Unsupervised Learning of Morphology by using Syntactic Categories

This paper presents a method for unsupervised learning of morphology that exploits the syntactic categories of words. Previous research [4][12] on learning of morphology and syntax has shown that both kinds of knowledge affect each other making it possible to use one type of knowledge to help the other. In this work, we make use of syntactic information i.e. Part-of-Speech (PoS) tags of words t...

متن کامل

Unsupervised Learning of Na ve Morphology with Genetic Algorithms

The morphological lexicon is an important part of NLP systems which is typ ically hand written with the help of linguist experts Even a partial automation of this process could decrease the cost of the lexicon being of theoretical impor tance for languages and dialects which have not been well analysed yet In this work we describe an attempt to use the minimal description length MDL as the one ...

متن کامل

Inducing the Morphological Lexicon of a Natural Language from Unannotated Text

This work presents an algorithm for the unsupervised learning, or induction, of a simple morphology of a natural language. A probabilistic maximum a posteriori model is utilized, which builds hierarchical representations for a set of morphs, which are morpheme-like units discovered from unannotated text corpora. The induced morph lexicon stores parameters related to both the “meaning” and “form...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006